home *** CD-ROM | disk | FTP | other *** search
-
- ────────────────────────────────────────────────────────────────────────────
- VLA 3/93 Introduction to ASSEMBLER
- ────────────────────────────────────────────────────────────────────────────
-
- Here's something to help those of you who were having trouble understanding
- the instructional programs we released. Dreaden made these files for the
- Kabal and myself when we were just learning. These files go over some of
- the basic concepts of assembler. Bonus of bonuses. These files also have
- programs imbedded in them. Most of them have a ton of comments so even
- the beginning programmers should be able to figure them out.
-
- If you'd like to learn more, post a message on Phantasm. We need to know
- where you're interests are before we can make more files to bring out the
- little programmers that are hiding inside all of us.
-
- Lithium/VLA
-
- ────────────────────────────────────────────────────────────────────────────
-
- First thing ya need to know is a little jargon so you can talk about
- the basic data structures with your friends and neighbors. They are (in
- order of increasing size) BIT, NIBBLE, BYTE, WORD, DWORD, FWORD, PWORD and
- QWORD, PARA, KiloByte, MegaByte. The ones that you'll need to memorize are
- BYTE, WORD, DWORD, KiloByte, and MegaByte. The others aren't used all that
- much, and you wont need to know them to get started. Here's a little
- graphical representation of a few of those data structures:
-
- (The zeros in between the || is a graphical representation of the number of
- bits in that data structure.)
-
- ──────
- 1 BIT : |0|
-
- The simplest piece of data that exists. Its either a 1 or a zero.
- Put a string of them together and you have a BASE-2 number system.
- Meaning that instead of each 'decimal' place being worth 10, its only
- worth 2. For instance: 00000001 = 1; 00000010 = 2; 00000011 = 3, etc..
-
- ──────
- 1 NIBBLE: |0000|
- 4 BITs
-
- The NIBBLE is half a BYTE or four BITS. Note that it has a maximum value
- of 15 (1111 = 15). Not by coincidence, HEXADECIMAL, a base 16 number
- system (computers are based on this number system) also has a maximum
- value of 15, which is represented by the letter 'F'. The 'digits' in
- HEXADECIMAL are (in increasing order):
-
- "0123456789ABCDEF"
-
- The standard notation for HEXADECIMAL is a zero followed by the number
- in HEX followed by a lowercase "h" For instance: "0FFh" = 255 DECIMAL.
-
- ──────
- 1 BYTE |00000000|
- 2 NIBBLEs └─ AL ─┘
- 8 BITs
-
- The BYTE is the standard chunk of information. If you asked how much
- memory a machine had, you'd get a response stating the number of BYTEs it
- had. (Usually preceded by a 'Mega' prefix). The BYTE is 8 BITs or
- 2 NIBBLEs. A BYTE has a maximum value of 0FFh (= 255 DECIMAL). Notice
- that because a BYTE is just 2 NIBBLES, the HEXADECIMAL representation is
- simply two HEX digits in a row (ie. 013h, 020h, 0AEh, etc..)
-
- The BYTE is also that size of the 'BYTE sized' registers - AL, AH, BL, BH,
- CL, CH, DL, DH.
-
- ──────
- 1 WORD |0000000000000000|
- 2 BYTEs └─ AH ─┘└─ AL ─┘
- 4 NIBBLEs └───── AX ─────┘
- 16 BITs
-
- The WORD is just two BYTEs that are stuck together. A word has a maximum
- value of 0FFFFh (= 65,535). Since a WORD is 4 NIBBLEs, it is represented
- by 4 HEX digits. This is the size of the 16bit registers on the 80x86
- chips. The registers are: AX, BX, CX, DX, DI, SI, BP, SP, CS, DS, ES, SS,
- and IP. Note that you cannot directly change the contents of IP or CS in
- any way. They can only be changed by JMP, CALL, or RET.
-
- ──────
- 1 DWORD
- 2 WORDs |00000000000000000000000000000000|
- 4 BYTEs │ └─ AH ─┘└─ AL ─┘
- 8 NIBBLEs │ └───── AX ─────┘
- 32 BITs └──────────── EAX ─────────────┘
-
- A DWORD (or "DOUBLE WORD") is just two WORDs, hence the name DOUBLE-WORD.
- This can have a maximum value of 0FFFFFFFFh (8 NIBBLEs, 8 'F's) which
- equals 4,294,967,295. Damn large. This is also the size or the 386's
- 32bit registers: EAX, EBX, ECX, EDX, EDI, ESI, EBP, ESP, EIP. The 'E '
- denotes that they are EXTENDED registers. The lower 16bits is where the
- normal 16bit register of the same name is located. (See diagram.)
-
- ──────
- 1 KILOBYTE |-lots of zeros (8192 of 'em)-|
- 256 DWORDs
- 512 WORDs
- 1024 BYTEs
- 2048 NIBBLEs
- 8192 BITs
-
- We've all heard the term KILOBYTE byte, before, so I'll just point out
- that a KILOBYTE, despite its name, is -NOT- 1000 BYTEs. It is actually
- 1024 bytes.
-
- ──────
- 1 MEGABYTE |-even more zeros (8,388,608 of 'em)-|
- 1,024 KILOBYTEs
- 262,144 DWORDs
- 524,288 WORDs
- 1,048,576 BYTEs
- 2,097,152 NIBBLEs
- 8,388,608 BITs
-
- Just like the KILOBYTE, the MEGABYTE is -NOT- 1 million bytes. It is
- actually 1024*1024 BYTEs, or 1,048,578 BYTEs
-
- ──────────────────────────────
-
- Now that we know what the different data types are, we will investigate
- an annoying little aspect of the 80x86 processors. I'm talking about
- nothing other than SEGMENTS & OFFSETS!
-
-
- SEGMENTS & OFFSETS:
- ──────────────────
- Pay close attention, because this topic is (I believe) the single most
- difficult (or annoying, once you understand it) aspect of ASSEMBLER.
-
- An OverView:
-
- The original designers of the 8088, way back when dinasaurs ruled the
- planet, decided that no one would ever possibly need more than one MEG
- (short for MEGABYTE :) of memory. So they built the machine so that it
- couldn't access above 1 MEG. To access the whole MEG, 20 BITs are needed.
- Problem was that the registers only had 16 bits, and if the used two
- registers, that would be 32 bits, which was way too much (they thought.)
- So they came up with a rather brilliant (blah) way to do their addressing-
- they would use two registers. They decided that they would not be 32bits,
- but the two registers would create 20 bit addressing. And thus Segments
- and OFfsets were born. And now the confusing specifics.
-
- ──────────────────
-
- OFFSET = SEGMENT*16
- SEGMENT = OFFSET /16 ;note that the lower 4 bits are lost
-
-
- SEGMENT * 16 |0010010000010000----| range (0 to 65535) * 16
- +
- OFFSET |----0100100000100010| range (0 to 65535)
- =
- 20 bit address |00101000100100100010| range 0 to 1048575 (1 MEG)
- └───── DS ─────┘
- └───── SI ─────┘
- └─ Overlap─┘
-
- This shows how DS:SI is used to construct a 20 bit address.
-
- Segment registers are: CS, DS, ES, SS. On the 386+ there are also FS & GS
-
- Offset registers are: BX, DI, SI, BP, SP, IP. In 386+ protected mode, ANY
- general register (not a segment register) can be used as an Offset
- register. (Except IP, which you can't access.)
-
- CS:IP => Points to the currently executing code.
- SS:SP => Points to the current stack position.
-
- ──────────────────
-
- If you'll notice, the value in the SEGMENT register is multiplied by
- 16 (or shifted left 4 bits) and then added to the OFFSET register.
- Together they create a 20 bit address. Also Note that there are MANY
- combinations of the SEGMENT and OFFSET registers that will produce the
- same address. The standard notation for a SEGment/OFFset pair is:
-
- ────
- SEGMENT:OFFSET or A000:0000 ( which is, of course in HEX )
-
- Where SEGMENT = 0A000h and OFFSET = 00000h. (This happens to be the
- address of the upper left pixel on a 320x200x256 screen.)
- ────
-
- You may be wondering what would happen if you were to have a segment
- value of 0FFFFh and an offset value of 0FFFFh.
-
- Take notice: 0FFFFh * 16 (or 0FFFF0h ) + 0FFFFh = 1,114,095, which is
- definately larger than 1 MEG (which is 1,048,576.)
-
- This means that you can actually access MORE than 1 meg of memory!
- Well, to actually use that extra bit of memory, you would have to enable
- something called the A20 line, which just enables the 21st bit for
- addressing. This little extra bit of memory is usually called
- "HIGH MEMORY" and is used when you load something into high memory or
- say DOS = HIGH in your AUTOEXEC.BAT file. (HIMEM.SYS actually puts it up
- there..) You don't need to know that last bit, but hey, knowledge is
- good, right?
-
- ──────────────────────
- THE REGISTERS:
- ──────────────────────
-
- I've mentioned AX, AL, and AH before, and you're probably wondering what
- exactly they are. Well, I'm gonna go through one by one and explain
- what each register is and what it's most common uses are. Here goes:
-
- ────
- AX (AH/AL):
- AX is a 16 bit register which, as metioned before, is merely two bytes
- attached together. Well, for AX, BX, CX, & DX you can independantly
- access each part of the 16 bit register through the 8bit (or byte sized)
- registers. For AX, they are AL and AH, which are the Low and High parts
- of AX, respectivly. It should be noted that any change to AL or AH,
- will change AX. Similairly any changes to AX may or may not change AL and
- AH. For instance:
-
- ────────
- Let's suppose that AX = 00000h (AH and AL both = 0, too)
-
- mov AX,0
- mov AL,0
- mov AH,0
-
- Now we set AL = 0FFh.
-
- mov AL,0FFh
-
- :AX => 000FFh ;I'm just showing ya what's in the registers
- :AL => 0FFh
- :AH => 000h
-
- Now we increase AX by one:
-
- INC AX
-
- :AX => 00100h (= 256.. 255+1= 256)
- :AL => 000h (Notice that the change to AX changed AL and AH)
- :AH => 001h
-
- Now we set AH = 0ABh (=171)
-
- mov AH,0ABh
-
- :AX => 0AB00h
- :AL => 000h
- :AH => 0ABh
-
- Notice that the first example was just redundant...
- We could've set AX = 0 by just doing
-
- mov ax,0
-
- :AX => 00000h
- :AL => 000h
- :AH => 000h
-
- I think ya got the idea...
- ────────
-
- SPECIAL USES OF AX:
- Used as the destination of an IN (in port)
- ex: IN AL,DX
- IN AX,DX
-
- Source for the output for an OUT
- ex: OUT DX,AL
- OUT DX,AX
-
- Destination for LODS (grabs byte/word from [DS:SI] and INCreses SI)
- ex: lodsb (same as: mov al,[ds:si] ; inc si )
- lodsw (same as: mov ax,[ds:si] ; inc si ; inc si )
-
- Source for STOS (puts AX/AL into [ES:DI] and INCreses DI)
- ex: stosb (same as: mov [es:di],al ; inc di )
- stosw (same as: mov [es:di],ax ; inc di ; inc di )
-
- Used for MUL, IMUL, DIV, IDIV
- ────
- BX (BH/BL): same as AX (BH/BL)
-
- SPECIAL USES:
- As mentioned before, BX can be used as an OFFSET register.
- ex: mov ax,[ds:bx] (grabs the WORD at the address created by
- DS and BX)
-
- CX (CH/CL): Same as AX
-
- SPECIAL USES:
- Used in REP prefix to repeat an instruction CX number of times
- ex: mov cx,10
- mov ax,0
- rep stosb ;this would write 10 zeros to [ES:DI] and increase
- ;DI by 10.
- Used in LOOP
- ex: mov cx,100
- THELABEL:
-
- ;do something that would print out 'HI'
-
- loop THELABEL ;this would print out 'HI' 100 times
- ;the loop is the same as: dec cx
- jne THELABAL
-
- DX (DH/DL): Same as above
- SPECIAL USES:
- USED in word sized MUL, DIV, IMUL, IDIV as DEST for high word
- or remainder
-
- ex: mov bx,10
- mov ax,5
- mul bx ;this multiplies BX by AX and puts the result
- ;in DX:AX
-
- ex: (continue from above)
- div bx ;this divides DX:AX by BX and put the result in AX and
- ;the remainder (in this case zero) in DX
-
- Used as address holder for IN's, and OUT's (see ax's examples)
-
- INDEX REGISTERS:
-
- DI: Used as destination address holder for stos, movs (see ax's examples)
- Also can be used as an OFFSET register
-
- SI: Used as source address holder for lods, movs (see ax's examples)
- Also can be used as OFFSET register
-
- Example of MOVS:
- movsb ;moves whats at [DS:SI] into [ES:DI] and increases
- movsw ; DI and SI by one for movsb and 2 for movsw
-
- NOTE: Up to here we have assumed that the DIRECTION flag was cleared.
- If the direction flag was set, the DI & SI would be DECREASED
- instead of INCREASED.
- ex: cld ;clears direction flag
- std ;sets direction flag
-
- STACK RELATED INDEX REGISTERS:
- BP: Base Pointer. Can be used to access the stack. Default segment is
- SS. Can be used to access data in other segments throught the use
- of a SEGMENT OVERRIDE.
-
- ex: mov al,[ES:BP] ;moves a byte from segment ES, offset BP
- Segment overrides are used to specify WHICH of the 4 (or 6 on the
- 386) segment registers to use.
-
- SP: Stack Pointer. Does just that. Segment overrides don't work on this
- guy. Points to the current position in the stack. Don't alter unless
- you REALLY know what you are doing.
-
- SEGMENT REGISTERS:
- DS: Data segment- all data read are from the segment pointed to be this
- segment register unless a segment overide is used.
- Used as source segment for movs, lods
- This segment also can be thought of as the "Default Segment" because
- if no segment override is present, DS is assumed to be the segmnet
- you want to grab the data from.
-
- ES: Extra Segment- this segment is used as the destination segment
- for movs, stos
- Can be used as just another segment... You need to specify [ES:░░]
- to use this segment.
-
- FS: (386+) No particular reason for it's name... I mean, we have CS, DS,
- and ES, why not make the next one FS? :) Just another segment..
-
- GS: (386+) Same as FS
-
-
- OTHERS THAT YOU SHOULDN'T OR CAN'T CHANGE:
- CS: Segment that points to the next instruction- can't change directly
- IP: Offset pointer to the next instruction- can't even access
- The only was to change CS or IP would be through a JMP, CALL, or RET
-
- SS: Stack segment- don't mess with it unless you know what you're
- doing. Changing this will probably crash the computer. This is the
- segment that the STACK resides in.
-
- ────────
- Heck, as long as I've mentioned it, lets look at the STACK:
-
- The STACK is an area of memory that has the properties of a STACK of
- plates- the last one you put on is the first one take off. The only
- difference is that the stack of plates is on the roof. (Ok, so that
- can't really happen... unless gravity was shut down...) Meaning that
- as you put another plate (or piece of data) on the stack, the STACK grows
- DOWNWARD. Meaning that the stack pointer is DECREASED after each PUSH,
- and INCREASED after each POP.
-
- _____ Top of the allocated memory in the stack segment (SS)
- ■
- ■
- ■
- ■ « SP (the stack pointer points to the most recently pushed byte)
-
- Truthfully, you don't need to know much more than a stack is Last In,
- First Out (LIFO).
-
- WRONG ex: push cx ;this swaps the contents of CX and AX
- push ax ;of course, if you wanted to do this quicker, you'd
- ...
- pop cx ;just say XCHG cx,ax
- pop ax ; but thats not my point.
-
- RIGHT ex: push cx ;this correctly restores AX & CX
- push ax
- ...
- pop ax
- pop cx
-
- ────────────
-
- Now I'll do a quick run through on the assembler instructions that you MUST
- know:
-
- ────
- MOV:
-
- Examples of different addressing modes:
-
- MOV ax,5 ;moves and IMMEDIATE value into ax (think 'AX = 5')
- MOV bx,cx ;moves a register into another register
- MOV cx,[SI] ;moves [DS:SI] into cx (the Default Segment is Used)
- MOV [DI+5],ax ;moves ax into [DS:DI+5]
- MOV [ES:DI+BX+34],al ;same as above, but has a more complicated
- ;OFFSET (=DI+BX+34) and a SEGMENT OVERRIDE
- MOV ax,[546] ;moves whats at [DS:546] into AX
-
- Note that the last example would be totally different if the brackets
- were left out. It would mean that an IMMEDIATE value of 546 is put into
- AX, instead of what's at offset 546 in the Default Segment.
-
- ANOTHER STANDARD NOTATION TO KNOW:
- Whenever you see brackets [] around something, it means that it refers to
- what is AT that offset. For instance, say you had this situation:
-
- ────────────
- MyData dw 55
- ...
- mov ax,MyData
- ────────────
-
- What is that supposed to mean? Is MyData an Immediate Value? This is
- confusing and for our purposes WRONG. The 'Correct' way to do this would
- be:
-
- ────────────
- MyData dw 55
- ...
- mov ax,[MyData]
- ────────────
-
- This is clearly moving what is AT the address of MyData, which would be
- 55, and not moving the OFFSET of MyData itself. But what if you
- actually wanted the OFFSET? Well, you must specify directly.
-
- ────────────
- MyData dw 55
- ...
- mov ax,OFFSET MyData
- ────────────
-
- Similiarly, if you wanted the SEGMENT that MyData was in, you'd do this:
-
- ────────────
- MyData dw 55
- ...
- mov ax,SEG MyData
- ────────────
-
- ────────────────────────
- INT:
- Examples:
- INT 21h ;calls DOS standard interrupt # 21h
- INT 10h ;the Video BIOS interrupt..
-
- INT is used to call a subroutine that performs some function that you'd
- rather not write yourself. For instance, you would use a DOS interrupt
- to OPEN a file. You would similiarly use the Video BIOS interrupt to
- set the screen mode, move the cursor, or to do any other function that
- would be difficult to program.
-
- Which subroutine the interrupt preforms is USUALLY specified by AH.
- For instance, if you wanted to print a message to the screen you'd
- use INT 21h, subfunction 9 by doing this:
-
- ────────────
- mov ah,9
- int 21h
- ────────────
-
- Yes, it's that easy. Of course, for that function to do anything, you
- need to specify WHAT to print. That function requires that you have
- DS:DX be a FAR pointer that points to the string to display. This string
- must terminate with a dollar sign. Here's an example:
-
- ────────────
- MyMessage db "This is a message!$"
- ...
- mov dx,OFFSET MyMessage
- mov ax,SEG MyMessage
- mov ds,ax
- mov ah,9
- int 21h
- ...
- ────────────
-
- The DB, like the DW (and DD) merely declares the type of a piece of data.
-
- DB => Declare Byte (I think of it as 'Data Byte')
- DW => Declare Word
- DD => Declare Dword
-
- Also, you may have noticed that I first put the segment value into AX
- and then put it into DS. I did that because the 80x86 does NOT allow
- you to put an immediate value into a segment register. You can, however,
- pop stuff into a Segment register or mov an indexed value into the
- segment register. A few examples:
-
- ────────────
- LEGAL:
- mov ax,SEG MyMessage
- mov ds,ax
-
- push SEG Message
- pop ds
-
- mov ds,[SegOfMyMessage]
- ;where [SegOfMyMessage] has already been loaded with
- ; the SEGMENT that MyMessage resides in
- ILLEGAL:
- mov ds,10
- mov ds,SEG MyMessage
- ────────────
-
- Well, that's about it for what you need to know to get started...
-
- ────────────────────────────────────────────────────────────────────────
- And now the FRAME for an ASSEMBLER program.
- ──────────────────────────────────────────────────────────────────────
-
- The Basic Frame for an Assembler program using Turbo Assembler simplified
- directives is:
-
- ;===========-
-
- DOSSEG ;This arranges the segments in order according DOS standards
- ;CODE, DATA, STACK
- .MODEL SMALL ;dont worry about this yet
- .STACK 200h ;tells the compiler to put in a 200h byte stack
- .CODE ;starts code segment
-
- ASSUME CS:@CODE, DS:@CODE
-
- START: ;generally a good name to use as an entry point
-
- mov ax,4c00h
- int 21h
-
- END START
-
- ;===========- By the way, a semicolon means the start of a comment.
-
- If you were to enter this program and TASM & TLINK it, it would execute
- perfectly. It will do absolutly nothing, but it will do it well.
-
- What it does:
- Upon execution, it will jump to START. move 4c00h into AX,
- and call the DOS interrupt, which exits back to DOS.
-
- Outout seen: NONE
- ────────────────────────────────────────────────────────────────────────
-
- That's nice, eh? If you've understood the majority of what was presented
- in this document, you are ready to start programming!
-
- See ASM0.TXT and ASM0.ASM to continue this wonderful assembler stuff...
-
-
- Written By Draeden/VLA
-
-
-
-